{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "pre {\n",
    " white-space: pre-wrap !important;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(odd) {\n",
    "    background-color: #f9f9f9;\n",
    "}\n",
    ".table-striped > tbody > tr:nth-of-type(even) {\n",
    "    background-color: white;\n",
    "}\n",
    ".table-striped td, .table-striped th, .table-striped tr {\n",
    "    border: 1px solid black;\n",
    "    border-collapse: collapse;\n",
    "    margin: 1em 2em;\n",
    "}\n",
    ".rendered_html td, .rendered_html th {\n",
    "    text-align: left;\n",
    "    vertical-align: middle;\n",
    "    padding: 4px;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Frequently Asked Questions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### I have a massive CSV file which I can not fit all into memory at one time. How do I convert it to HDF5?\n",
    "\n",
    "We are working to make this process an easy one liner. In the meantime, consider this strategy: read the CSV file in chunks, and use `vaex` to export each chunk to disk. Since all resulting HDF5 files will have the same structure, one can use `vaex.open(part*)` to open all chunks as a single DataFrame. For a small performance improvement, that DataFrame can be exported to disk in a single large HDF5 file. \n",
    "\n",
    "Consider the following code example:\n",
    "\n",
    "```\n",
    "for i, chunk in enumerate(vaex.read_csv('/path/to/data/BigData.csv', chunksize=100_000)):\n",
    "    df_chunk = vaex.from_pandas(chunk, copy_index=False)\n",
    "    export_path = f'/path/to/data/part_{i}.hdf5'\n",
    "    df_chunk.export_hdf5(export_path)\n",
    "\n",
    "df = vaex.open('/path/to/data/part*')    \n",
    "df.export_hdf5('/path/to/data/Final.hdf5')\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Why can't I open a HDF5 file that was exported from a `pandas` DataFrame via the `.to_hdf`?\n",
    "\n",
    "When one uses the `pandas` `.to_hdf` method, the output HDF5 file has a row based format. `Vaex` on the other hand expects column based HDF5 files. This allows for more efficient reading of data column, which is much more commonly required for data science applications. \n",
    "\n",
    "One can easily export a `pandas` DataFrame to a `vaex` friendly HDF5 file:\n",
    "```\n",
    "vaex_df = vaex.from_pandas(pandas_df, copy_index=False)\n",
    "vaex_df.export_hdf5('my_data.hdf5')\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
